Personnel
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Data Management

Top-k Query Processing Over Encrypted Data in Clouds

Participants : Sakina Mahboubi, Reza Akbarinia, Patrick Valduriez.

Cloud data outsourcing provides users and companies with powerful capabilities to store and process their data in third-party data centers. However, the privacy of the outsourced data is not guaranteed by the cloud providers. One solution for protecting the user data against security attacks is to encrypt the data before being sent to the cloud servers. Then, the main problem is to evaluate user queries over the encrypted data.

In [41], we address the problem of top-k query processing over encrypted data, and propose an efficient approach called BuckTop. Our approach uses the bucketization technique to manage the encrypted data in the remote server. It includes a top-k query processing algorithm that works on the encrypted data of the buckets, and returns a set that contains the encrypted top-k results. It also has a filtering algorithm that efficiently eliminates the false positives in the server side. We implemented BuckTop, and compared its response time for processing top-k queries over encrypted data with that of the TA algorithm over original (plaintext) data. Our results show excellent performance gains. They show that the response time of BuckTop over encrypted data is close to TA over plaintext data.

End-to-end Graph Mapper

Participants : Benjamin Billet, Didier Parigot, Patrick Valduriez.

The growth of linked data in web and mobile applications motivates software developers to model their business data as graphs, enabling them to leverage the capabilities of various graph databases. Going one step further, we introduce an End-to-end Graph Mapper (EGM) [22], for modeling the whole application as (i) a set of graphs representing the business data, the in-memory data structure maintained by the application and the user interface (tree of graphical components), and (ii) a set of standardized mapping operators that maps these graphs with each other. As a benefit, the application becomes a complex live query over multiple graph databases, making the development process simpler and safer, thanks to the automation of repetitive development tasks. This work has been done in collaboration with Beepeers (http://www.beepeers.com), a startup that develops and markets social network mobile applications for small communities in the context of the Triton I-lab.

Management of Simulation Data

Participants : Vitor Silva, Patrick Valduriez.

In complex simulations, users must track quantities of interest (residuals, errors estimates, etc.) to control as much execution as possible. However, this tracking is typically done only after the simulation ends. We are designing techniques to extract, index and relate strategic simulation data for online queries while simulation is running. We consider coupling these techniques with largely adopted libraries such as libMesh (for numerical solvers) and ParaView (for visualization), so that queries on quantities of interest are enhanced by visualization and provenance data. Interactive data analysis support is planned for post simulation and runtime as in-situ and in-transit, taking advantage of memory access at runtime.

In [21], we propose a solution (architecture and algorithms) to combine the advantages of a dataflow-aware scientific workflow management system (SWfMS) and the raw data file analysis techniques to allow for queries on raw data file elements that are related, but reside in separate files. Armful is the name of the architecture and its main components are a raw data extractor, a provenance gatherer and a query processing interface, which are all dataflow aware. We show ARMFUL instantiated with the Chiron SWfMS.

In [31], we instantiate Armful without the SWfMS, plugging the components directly in the simulation code of highly optimized parallel applications. With support of sophisticated online data analysis, scientists get a detailed view of the execution, providing insights to determine when and how to tune parameters.

We also started investigating the combination of in-transit analysis and visualization, with the development of SAVIME (Scientific Analysis and Visualization In-Memory). The system adopts a multi-dimensional data model TARS (Typed Array Schema) [29] that enables the representation of simulation output data, the topology mesh and simulation metadata. Data produced by the simulation is ingested into the system without any transformation as a Typed Array (TAR). We intend SAVIME to implement an algebra on TARs that enables simulation output analysis and direct production of visualization output.